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Abstract 

Iterated regret minimization has been introduced recently 
by J.Y. Halpern and R. Pass in classical strategic games. 
For many games of interest, this new solution concept 
provides solutions that are judged more reasonable than 
solutions offered by traditional game concepts - such as 
Nash equilibrium -. Although computing iterated regret 
on explicit matrix game is conceptually and computa- 
tionally easy, nothing is known about computing the it- 
erated regret on games whose matrices are defined im- 
plicitly using game tree, game DAG or, more generally 
game graphs. In this paper, we investigate iterated regret 
minimization for infinite duration two-player quantitative 
non-zero sum games played on graphs. 

We consider reachability objectives that are not neces- 
sarily antagonist. Edges are weighted by integers - one 
for each player -, and the payoffs are defined by the sum 
of the weights along the paths. Depending on the class of 
graphs, we give either polynomial or pseudo-polynomial 
time algorithms to compute a strategy that minimizes the 
regret for a fixed player. We finally give algorithms to 
compute the strategies of the two players that minimize 
the iterated regret for trees, and for graphs with strictly 
positive weights only. 

1 Introduction 

The analysis of complex interactive systems like embed- 
ded systems or distributed systems is a major challenge of 
computer aided verification. Zero-sum games on graphs 
provide a good framework to model interactions between 
a component and an environment as they are strictly com- 
petitive. However in the context of modern interactive 
systems, several components may interact and be con- 



trolled independently. Non-zero sum games on graphs are 
more accurate to model such systems, as the objectives are 
not necessarily antagonist. There are initial results in this 
area but a large number of questions are open. In this pa- 
per, we adapt to game graphs a new solution concept of 
non-zero sum games initially defined for strategic games. 

In 0, J.Y. Halpern and R. Pass defined the notion of 
iterated regret minimization. This solution concept as- 
sumes that instead of trying to minimize what she has 
to pay, each player tries to minimize her regret. The re- 
gret is informally defined as the difference between what a 
player actually pays and what she could have payed if she 
knew the strategy chosen by the other player. More for- 
mally, if Ui(Ai, A2) represents what Player 1 pays when 
the pair of strategies (Ai, A2) is played, reg 1 (Ai, A2) = 
Ui(Ai,A 2 ) -min Vi Ui(Al,A 2 ). 

Let us illustrate this on an example. Consider the strate- 
gic game defined by the matrix of figure [TJ In the game 
underlying this matrix, Player 1 has two strategies A\ and 
B\ and Player 2 has two strategies A 2 and B 2 . The two 
players choose a strategy at the same time and the pairs of 
strategies define what the two players have to pajo The 
regret of playing A\ for Player 1 if Player 2 plays A2 is 
equal to 1 because Ui(Ai,A2) is 2 when \\i(B\, A2) is 
1. Knowing that Player 2 plays A 2 , Player 1 should have 
played B\. 

As Players have to choose strategies before knowing 
how the adversary will play, we associate a regret with 
each strategy as follows. The regret of a strategy Ai of 
Player 1 is : reg^Ai) = max A2 reg^Ai, A 2 ). In the 
example, the regret attached to strategy A\ is equal to 



'We could have considered rewards instead of penalties, everything 
is symmetrical. 
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Figure 1 : A strategic game with explicit penalty matrix. 

1, because when Player 2 plays A<x, Player 1 regret is 1, 
and when Player 2 plays B 2 her regret is 0. A rational 
player should minimize her regret. The regret for Player 
1 is thus defined as f&Qi = mirtA! reg 1 (Ai), sum- 
marizing, we get xeQ 1 = min^ maxA 2 (lii(Ai, A2) — 
minx' Ui (A' x , A2)). A symmetrical definition can be given 
for Player 2's regret. 

Let us come back to the example. The regret attached 
to strategy B\ is equal to 1. So the two strategies of Player 
1 are equivalent w.r.t. regret minimization. On the other 
hand, for Player 2, the regret of A 2 equals 0, and the regret 
of B 2 equals 3. So, if Player 1 makes the hypothesis that 
Player 2 is trying to minimize her regret, then she must 
conclude that Player 2 will play A 2 . Knowing that, Player 
1 recomputes her regret for each action, and in this case, 
the regret of action A\ is 1 while the regret of B\ is 0. 
So rational players minimizing their regret should end up 
playing the pairs (Bi,A 2 ) in this game. 

Reasoning on rationality is formalized by Halpern and 
Pass by introducing a delete operator that erases strictly 
dominated strategies. This operator takes sets of strate- 
gies (Ai, A2) for each player and returns D(Ai, A 2 ) — 
(A'^Aj) the strategies that minimize regret. Then 
D(A' 1 ,A' 2 ) returns the strategies that minimize regret un- 
der the hypothesis that adversaries minimize their regret 
i.e., choose their strategies in A^ and A 2 respectively. In 
the case of finite matrix games, this operator is monotone 
and converges on the strategies that minimize regrets for 
the two players making the assumption of rationality of 
the other player. 

In this paper, we consider games where the matrix is not 
given explicitly but defined implicitly by a game graph. 
More precisely, we consider graphs where vertices are 
partitioned into vertices that belong to Player 1 and ver- 
tices that belong to Player 2. Each edge is annotated by 
a penalty for Player 1 and one for Player 2. Additionally, 
there are two designated sets of vertices, one that Player 
1 tries to reach and the other one that Player 2 tries to 
reach. The game starts in the initial vertex of the graph 
and is played for an infinite number of rounds as follows. 



In each round, the Player who owns the vertex on which 
the pebble is placed moves the the pebble to an adjacent 
vertex using an edge of the graph, and a new round starts. 
The infinite plays generate an infinite sequence of vertices 
and the amount that the players have to pay are computed 
as follows. Player 1 pays +00 if the sequence does not 
reach the target set assigned to Player 1, otherwise she 
pays the sum of edge costs assigned to her on the prefix 
up to the first visit to her target set. The amount to pay 
for Player 2 is defined symmetrically. Strategies in such 
games are functions from the set of histories of plays (se- 
quences of visited vertices) to edges (choice of moves for 
the pebble). 

Let us consider the game graph of Fig. Q] This is a 
formalization of the so-called Centipede game [9 1 in our 
game graphs. We have considered a 5-round variant here, 
this game can be generalized to any number of rounds. 
Initially, the pebble is on vertex A. Player 1 owns the cir- 
cle vertices and Player 2 owns the square vertices. The 
target objective for the two players is the same: they both 
want to reach vertex S. At each round, one of the players 
has to choose either to stop the game and reach the tar- 
get, or to let the game continue for at least an additional 
round. The penalties attached to edges are given as pairs 
of integers (the first for Player 1 and the second for Player 
2). Strategies here are as follows. For each circle ver- 
tex, Player 1 must decide either to continue or to go to 
the target S, and symmetrically for Player 2. It can be 
shown (and computed by our algorithms) that the strat- 
egy of Player 1 that survives iterated regret minimization 
is the strategy that stops the game only in position E and 
the strategy for Player 2 is the strategy that continue the 
game to vertex D. This pair of strategy has a penalty of 
(1,3). This is an interesting and rather nice joint behavior 
of the two players in comparison of what Nash equilib- 
rium is predicting for this example. Indeed, the only Nash 
equilibrium^] in that game is the pair of strategies where 
the two players decide to stop directly the game and so 
they have to pay (5, 7). This is a 5-round example but the 
difference between the penalty of the Nash equilibrium 
and the iterated regret grows as the number of rounds in- 
creases. 



2 A Nash equilibrium is a pair of strategies where no player has an 
incentive to change her strategy if the other player keeps playing her 
strategy 
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Figure 2: Centipede Game 

Contributions We investigate iterated regret minimiza- 
tion for infinite duration two-player quantitative non-zero 
sum games played on graphs. We focus on reachability 
objectives that are not necessarily antagonist. 

We first consider target-weighted arenas, where the 
payoff function is defined for each state of the objectives. 
We give a PTIME algorithm to compute the regret by re- 
duction to a min-max game. 

We then consider edge-weighted arenas. Each edge 
is labeled by a pair of integers - one for each player - 
, and the payoffs are defined by the sum of the weights 
along the path until the first visit to an objective. We give 
a pseudo-PTIME algorithm to compute the regret in an 
edge-weighted arena, by reduction to a target-weighted 
arena. 

We also study the problem of iterated regret minimiza- 
tion. We provide a delete operator that removes strictly 
dominated strategies. We show how to compute the ef- 
fect of iterating this operator on tree arenas and strictly 
positive edge-weighted arenas. In the first case, we pro- 
vide a quadratic time algorithm and in the second case, a 
pseudo-exponential time algorithm. 

Related works Several notions of equilibria have been 
proposed in the literature for reasoning on 2-players non- 
zero-sum games, for instance Nash equilibrium, sequen- 
tial equilibrium, perfect equilibrium - see J8) for an 
overview. Those equilibria formalize notions of rational 
behavior by defining optimality criteria for pairs of strate- 
gies. As we have seen in the Centipede game example J9], 
or as it can be shown for other examples like the Trav- 
eller's dilemma JT), Nash equilibria sometimes suggest 
pairs of strategies that are rejected by common sense. Re- 
gret minimization is an alternative solution concept that 



sometimes proposes more intuitive solutions and requires 
more cooperation between players. Recently, non-zero 
sum games played on graphs have attracted a lot of atten- 
tion. There have been several papers that study Nash equi- 
libria or particular classes of Nash equilibria [6 , 3]|2][4]. 

Proofs that are sketched or omitted in the paper are 
given in Appendix. 

2 Weighted Games and Regret 

Given a cartesian product A x B of two sets, we de- 
note by proj j the i-th projection, i = 1,2. It is nat- 
urally extended to sequence of elements of A x B by 
proj^ci . . . c„) = proj 4 (ci) . . . proj^Cn). For all k e N, 
we let [k] = {0, . . . , k). 

Reachability Games Turn-based two-player games are 
played on game arenas by two players. A (finite) game 
arena is a tuple G = (S = S\ W S2, so, T) where Si, S2 
are finite disjoint sets of player positions (Si for Player 1 
and 52 for Player 2), so G Si is the initial position, and 
T C S x S is the transition relation. A finite play on G 
of length ?i is a finite word tt = ttqtti . . . ir n G S* such 
that 7To = so and for all i = 0, . . . , n — 1, (ni, 7Ti_|_i ) G T. 
Infinite plays are defined similarly. We denote by P/(G) 
(resp. Poo(GQ) the set of finite (resp. infinite) plays on 
G, and we let P(G) = P/(G) U Poo(G). For any node 
s G S, we denote by (G, s) the arena G where the initial 
position is s. 

Let i G {1, 2}. We let -i = 1 if i = 2 and -i = 2 if 
i = 1. A strategy Aj : P/(G) — > S U {±} for Player i is a 
mapping that maps any finite play tt whose last position - 
denoted last(7r) - is in Si to _L if there is no outgoing edge 
from last(7r), and to a position s such that (last(7r) , s) G T 
otherwise. The set of strategies of Player i in G is denoted 
by Aj(G). Given a strategy A_, G A_j(G), the outcome 
Out (Aj, A_j) is a play tt = ttq . . . ir n . . . such that (i) 
tto = so, (H) if is finite, then there is not outgoing 
edge from last(-7r), and (Hi) for all < j < [vr | and all 
K = 1, 2, if TVj G S K , then 7Tj+i = A K (7To . . . ttj). We also 
define Out G (A 4 ) = {Out G (A 4 , A_,) | A_ 4 G A_,(G)}. 

A strategy Aj is memoryless if for all play tt G P/(G), 
Aj(7r) only depends on last(7r). Thus A; can be seen as a 
function Si i-> S U {-L}. It is finite-memory if Aj(7r) only 



3 



depends on last (71- ) and on some state of a finite state set. 
We refer the reader to for formal definitions. 

A reachability winning condition (rwc/or short) for 
Player i is given by a subset of positions Cj C S - called 
the target set -. A play tt £ P(G) is winning for Player i 
if some position of tt is in C;. A strategy A, for Player i is 
winning if all the plays of Out G (Ai) are winning. In this 
paper, we often consider two target sets Ci , C2 for Player 
1 and 2 respectively. We write (Si, S2, so, T, Ci, C2) 
to denote the game arena G extended with those target 
sets. Finally, let A, £ A,(G) be a winning strategy for 
Player i and A_i £ A_ 4 (G). Let 7r 7ri • ■ • £ P(G) be 
the outcome of (Aj, A-,). The outcome of (A,, A_j) up to 

G C 

Ci is defined by Out ' * (Aj, A_j) = ttq . . . ir n such that 
n = minjj | ttj £ Ci}. We also extend this notation to 

G C 

sets of plays Out ' 1 (Aj) naturally. 



Weighted Games We add weights on edges of arenas 
and include the target sets. A (finite) weighted game arena 
is a tuple G = (S = S1WS2, So,T, Hi,fi2, Ci, C2) where 
(S, so, T) is a game arena, for all i = 1, 2, /ij : T — >• N 
is a weigth function for Player i and Ci its target set. We 
let Mf be the maximal weight of Player i, i.e. Mf = 
max eG T (J-i(e) and M G = max(A/ G , M§). 

G is a target-weighted arena (TWA for short) if only 
the edges leading to a target node are weighted by strictly 
positive integers, and any two edges leading to the same 
node carry the same weight. Formally, for all (s, s') £ T, 
if s' £ Ci, then fXi(s, s') = 0, otherwise for all (s", s') £ 
T, Hi(s,s') = fii(s",s'). Thus for target-weighted are- 
nas, we assume in the sequel that the weight functions 
map Ci to N. 

Let tt = ttqtti . . . TT n be a finite play in G. We extend 
the weight functions to finite plays, so that for all i = 1,2, 
Z^ 71 ") = YI^Zq tM{^j,^j+i)- The utility li?(ir) of tt (for 
Player i) is +00 if tt is not winning for Player i, and the 
sum of the weights occuring along the edges defined by tt 
until the first visit to a target position otherwise. Formally: 



uf(Ai,A 2 ) = uf(Out G (A 1 ,A 2 )) 

Let i £ {1, 2}. Given a strategy A 4 £ Aj(G), the best 
response of Player —i to Aj, denoted by br G (A,), is the 
least utility Player —i can achieve against Ai. Formally: 

u?i(A 2 ,A_ 4 ) 



brS(Ai) 



mm 

A-;6A_;(G) 



Regret Let i £ {1,2} and let Ai , A2 be two strategies of 
Player 1 and 2 respectively. The regret of Player i is the 
difference between the utility Player i achieves and the 
best response to the strategy of Player —i. Formally: 

regf(A l ,A_ l ) = uf (A 4 , A_ 4 ) - brf (A_. ( ) 

Note that regf(A ? ,A_i) > 0, since brf(A_ 4 ) < 
Up(Ai,A_i). The regret of a strategy Aj for Player i is 
the maximal regret she gets for all strategies of Player —i: 

regf(Aj) = max regf(A,,A_ 4 ) 

Finally, the regret of Player i in G is the minimal regret 
she can achieve: 

regf = min regf(Ai) 

AiGAi(G) 

We let +cx) — (+00) = +oo. 

Proposition 1. For alii = 1,2, regf < +00 iff Player i 
has a winning strategy. 

Proof. If Player % has no winning strategy, then for all 
A l £ Aj(G), there is A_ 4 £ A_,(G) s.t. uf(A l; A_ 4 ) = 
+00. Thus regf (Aj, A_j) = +00. Therefore regf = 
+00. 

If Player i has a winning strategy Ai, then for all 

A_, £ A_j(G), uf(A i) A_ i ) < +00 and brf(A_ 4 ) < 
uf (Ai, A-,) < +00. Thus regf < regf (Ai) < +00. □ 



uf W = 



E 



00 

min{/e 



Example 1. Consider the game arena G of Fig. \3\ We 
if tt is not winning for Player i om if Player 2 's weights since we are interested in com- 



Hi{TTj,TT j + i) 



otherwise puting the regret of Player 1. Player 1 's positions are cir- 
cle nodes and Player 2 's positions are square nodes. The 
We extend this notion to the utility of two strategies target nodes are represented by double circles. The ini- 
Ai, A 2 of Player 1 and 2 respectively: tial node is A. Let Ai be the memoryless strategy defined 
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Figure 3: Graph arena with a common weight function. 

by \i(B) = C and Ai(C) = E. For all A 2 G A 2 (G), 
Outf (Ai, A 2 ) is either ACE or ABCE, depending on 
whether Player 2 goes directly to C or passes by B. In 
both cases, the outcome is winning and uf (Ai, A2) = 3. 
What is the regret of playing Ai for Player 1? To com- 
pute regg(Ai), we should consider all possible strategies 
of Player 2, but a simple observation allows us to restrict 
this range. Indeed, to maximize the regret of Player 1, 
Player 2 should cooperate in subtrees where Ai prevents 
to go, i.e. in the subtrees rooted at D and F. Therefore 
we only have to consider the two following memoryless 
strategies A2 and \' 2 : both A2 and A' 2 move from F to J 
and from D to H, but A2 (A) = B while X' 2 (A) = C. In 
both cases, going to F is a best response to A2 and A' 2 
for Player 1, i.e. brf (A2) = brf (A' 2 ) = 0. Therefore we 

get regf (A l5 A 2 ) = uf (A l7 A 2 ) - brf (A 2 ) = 3-0 = 3. 
Similarly regf (Ai, A 2 ) = 3. Therefore regf (Ai) = 3. 

As a matter of fact, the strategy Ai minimizes the regret 
of Player 1. Indeed, if she chooses to go from B to D, 
then Player 2 moves from A to B and from D to G (so 
that Player 1 gets a utility 3) and cooperates in the subtree 
rooted at C by moving from F to J. The regret of Player 
1 is therefore 3. If Player 1 moves from B to C and from 
C to F, then Player 2 moves from A to C and from F to 
I (so that Player 1 gets a utility 4), and from D to H, the 
regret of Player 1 being therefore 4. Similarly, one can 
show that all other strategies of Player 1 have a regret at 
least 3. Therefore regf = 3. 

Note that the strategy Ai does not minimize the regret 



in the subgame defined by the subtree rooted at C. In- 
deed, in this subtree, Player 1 has to move from C to F, 
and the regret of doing this is 4 — 3 = 1. However the 
regret of Ai in the subtree is 3. This example illustrates a 
situation where a strategy that minimizes the regret in the 
whole game does not necessarily minimize the regret in 
the subgames. Therefore we cannot apply a simple back- 
ward algorithm to compute the regret. As we will see in 
the next section, we first have to propagate some informa- 
tion in the subgames. 

3 Regret Minimization on Target- 
Weighted Graphs 

In this section, our aim is to give an algorithm to compute 
the regret for Player i. This is done by reduction to a min- 
max game, defined in the sequel. We say that we solve the 
regret minimization problem (RMP for short) if we can 
compute the minimal regret and a (finite representation of 
a) strategy that achieves this value. 

Minmax games Let G = (S = Si td 

S2, s , T, pi, fi 2 , Ci, C2) be a TWA and i = 1,2. 
We define the value minmaxf as follows: 

minmaxf = min max uf(X l ,X- i ) 

AiGAi(G) A_iSA_i(G) 

Proposition 2. Given a TWA G = 
(S, so, T, /xi, p 2 , Ci, C2), i € {1,2} and K s N, one 
can decide in time 0(\S\ + \T\) whether minmaxf < K. 
The value minmax^ and a memoryless strategy 
that achieves this value can be computed in time 
0(log 2 (M?)(\S\ + \T\)). 

Proof. For all j > 0, we let Wj C S be the set of posi- 
tions from which Player i has a strategy to reach a position 
s E Ci in at most j steps, such that /Ltj(s) < K and such 
that she does not pass by a position s' G C; such that 
Pi(s') > K. Formally, we denote by Cf K the set of po- 
sitions s G Ci s.t. fM (s) > K. Then Wq = C,\Cf 1 and 
for all j > 0, Wj = Wj-i U Wf U W], where: 

Wf = {seSi\Cf K \3s' eWj-u&s'JeT} 

= {s G S-i\Cf K I V(s, s') G T, s' G Wj-i] 
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The sequence Wq,Wi,... converges in at most \S\ 
steps to a set W* , and minmaxf < K iff s G W*. 
In order to compute W* in time 0(\S\ + |T|), we add 
counters to positions that counts the number of their suc- 
cessors that are not already in the current set Wj . When 
adding a new node to Wj , we decrement the counter of its 
predecessor by one (if it was not already 0). Let s be one 
of its predecessors and c its counter value. If s G Si and c 
is strictly lesser than the number of its successors, s will 
be added to Wj+i. If s G S-i and c = 0, then all the suc- 
cessors of s are in Wj, therefore s will be added to Wj+\. 

G 

Now, in order to compute the value minmaX; , we use the 
previous algorithm as the building block of a dichotomy 
algorithm that starts with the maximal finite value which 
can be achieve by Player i if she has a winning strategy to 
its target, i.e. Mf. 

If minmaxf = +00, then any strategy achieves this 
value. Otherwise in order to extract a strategy, it suffices 
to keep for each position s G Wj D Si, a pointer to a po- 
sition s' G Wj-i such that (s, s') G T when computing 
the sequence of Wj's. Note that this strategy is memory- 
less. □ 

Since roles of the players are symmetric, without loss 
of generality we can focus on computing the regret of 
Player 1 only. Therefore we do not consider Player 2's tar- 
gets and weights. Let G = (S = Si t±d 52, so, T, m, Ci) 
be a TWA (assumed to be fixed from now on). Let 
Ai G Ai(G) be a winning strategy of Player 1 (if it exists). 
Player 2 can enforce Player 1 to follow one of the paths 

G C 

of Out ' x (Ai) by choosing a suitable strategy. When 

G C ■ 

choosing a path tt G Out ' 1 (Ai), in order to maximize 
the regret of Player 1, Player 2 cooperates (i.e. she min- 
imizes the utility) if Player 1 would have deviated from 
tt. This leads to the notion of best alternative along a 
path. Informally, the best alternative along tt is the min- 
imal value Player 1 could have achieved if she deviated 
from tt, assuming Player 2 cooperates. Since Player 2 can 
enforce one of the paths of Out G Cl (Ai), to maximize the 
regret of Player 1, she will choose the path tt with the 
highest difference between uf(n) and the minimal best 
alternative along tt. As an example consider the TWA 
arena of Fig. [3] In this example, if Player 1 moves from 
C to E, then along the path ACE, the best alternative is 
0. Indeed, the other alternative was to go from C to F and 



in this case, Player 2 would have cooperated. 

We now formally define the notion of best alternative. 
Let s G Si. The best value that can be achieved from s by 
Player 1 when Player 2 cooperates is defined by: 

best? (s) = min min uS G ' s) (Ai, A 2 ) 

AieAi(G,«) A 2 eA 2 (G,s) 

Let (s, s') G T. The best alternative of choosing s' 
from s for Player 1, denoted by ba G (s, s'), is defined as 
the minimal value she could have achieved by choosing 
another successor of s (assuming Player 2 cooperates). 
Formally: 

. G, a / +°° if s G S 2 

Udl lS ' S > \ min (SiS , 0eWs , best? (s") if ,s G Si 

with min = +00. Finally, the best alternative of a 
path tt = S0S1 . . . s n is defined as +00 if n = and as 
the minimal best alternative of the edges of tt otherwise: 

ba^M = min baL G {sj,Sj + i) 

0<j<n 

We first transform the graph G into a graph G' such 
that all the paths that lead to a node s have the same best 
alternative. This can be done since the number of best 
alternatives is bounded by |Ci|. The construction of G 1 
is done inductively by storing the best alternatives in the 
positions. 

Definition 1. The graph of best alternatives of G is the 
TWA G' = (S' = S[ W S' 2 , s' , T", n[, C[) defined by: 

• S'i = 5, x ([Mp] U {+00}), i = 1,2 and s' = 
(,s , +00); 

•for all (8,h),^,K) e S>, ((s,bi),(s\b[)) G T' iff 
(s, s') G T and 

w = f min(6i, baf (s,s')) if s G Si 
1 1 h ifs G S 2 

• Ci = S[ n (Ci x [Mf]) and V(s, b) G C'^^s, b) = 
(j,i(s). 

Proposition 3. For all (s, 6) G S' and all finite path tt in 
G' from (sq, +00) to (s, b), baf (tt) = b. 
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Because the number of best alternatives is bounded by 
| C 1 1 , the game G can be constructed in polynomial time: 

Proposition 4. G can be constructed in time 
0((|C 1 |+log 2 (M 1 G ))x(|5| + |r|)). 

Since the best alternative information depends only on 
the paths, the paths of G and those of G are in bijection. 
This bijection can be extended to strategies. In particular, 
we define two mappings $^ from A.;(G) to A.;(G'), for 
all i = 1,2. For all path it = sqSi ... in G (finite or 
infinite), we denote by B(tt) the path of G defined by 
(sq, &o)(si, bi) . . . where bo = +oo and for all j > 0, 
bj = ba^.(.so . . . Sj-i)- The mapping B is bijective, and 
its inverse corresponds to proj^ 

The mapping maps any strategy A.; £ A^(G) to a 
strategy ^(Aj) G Ai(G') such that $,(Ai) behaves as 
Ai on the first projection of the play and adds the best 
alternative information to the position. Let h G S'* 
such that last(/i) 6 S[. Let s = X l (pro\ 1 (h)). Then 
$i(A;)(/i) = (s, baQ(proj 1 (/i).s)). The inverse mapping 
•P" 1 just projects the best alternative information away. 
In particular, for all A- G Aj(G'), and all h G S* such 
thatlast(/i) G S it = proj 1 (A i ( J B(/ l ))). 

Then, $i's are bijective and <£>i preserves the regret val- 
ues: 

Lemma 1. VAi eAi(G), regf (Ai) = regf ($i(A x )). 

The best alternative information is crucial to com- 
pute the regret. This is a global information that al- 
lows us to compute the regret locally, as stated by the 
next lemma. For all (s,b) G C[, we let i>i(s,b) = 
fii(s) — min(/^i(s), b). We extend v\ to pairs of strate- 
gies as usual - v\{\\, A2) being infinite if Ai is losing -. 

Lemma 2. VAi G Ai(G'), regf'(Ai)= max v 1 (\x,\ 2 

A 2 eA 2 (G') 

Proof. (Sketch) It is clear if Ai is losing. If it is win- 
ning, then let A2 which maximizes reg^ (Ai) and ir — 

G' C 

Out ' 1 (Ai,A2). Without changing the regret values, 
we can assume that A2 cooperates if Player 1 would 
have deviated from 7r, i.e. A 2 minimizes the utility in 
the subgames (G, s) where s is not the successor of 
some element of n. The best response to A2 is either 
the value (Ai,A2), i.e. /ii(last(7r)), or the minimal 
best alternative along ir. By Proposition [3] this min- 
imal best alternative along n is exactly proj 2 (last(7r)). 



Therefore br^ (A2) = min^'j^last^)), baQ,(n)) and 
regf'(Ai) = ^ 1 (last(7r)) = ^ 1 (Ai,A 2 ). Conversely, 
for any strategy A2 which maximizes z^ 1 (Ai,A2), we 
can also assume without changing the value z^ 1 (A 1 ,A2) 
that A2 cooperates if Player 1 would have deviated from 
Out G (Ai, A 2 ), and we therefore have regf '(Ai, A 2 ) = 
^(Ai,A 2 ). □ 

We can now reduce the RMP to a min-max problem : 

Lemma 3. Let H = (S\ s' , T', v 1 , C[) where 
S' , Sq, T', Ci are defined in Definition^ Then 

regf = minmaxf 

Proof It is a consequence of Lemmas [T] and |2 

regf = min regf(Ai) (definition) 

AieAi(G) 

= min regf ($i(Ai)) (Lemma[T} 

AiGAi(G) 

= min regfYAi) (Lemma[TJ 

Ai £Ai (H) 

— min max f 1 (Ai,A 2 ) (LemmaOD 
AieAi(j?) x 2 eA 2 (H) 

As a consequence of Propositions |2] |4] and Lemma [3] 
we can solve the RMP on TWA's. We first compute the 
graph of best alternatives and solve a minmax game. This 
gives us a memoryless strategy that achieves the minimal 
regret in the graph of best alternatives. To compute a strat- 
egy in the original graph, we apply the inverse mapping 
: this gives a finite-memory strategy whose memory 
is exactly the best alternative seen along the current finite 
play. Therefore the needed memory is bounded by the 
.number of best alternatives, which is bounded by \C\\. 

Theorem 1. The RMP on a TWA G = (S, s , T, fi^d) 
can be solved in O (|Ci • log 2 {M?) ■ {\S\ + |T|)). 

4 Regret Minimization in Edge- 
Weighted Graphs 

In this section, we give a pseudo-polynomial time algo- 
rithm to solve the RMP in weighted arenas (with weights 
on edges). In a first step, we prove that if the regret is 
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finite, the strategies minimizing the regret generates out- 
comes whose utility is bounded by some value which de- 
pends on the graph. This allows us to reduce the problem 
to the RMP in a TWA, which can then be solved by the 
algorithm of the previous section. 

Let G = (S = Si tbl S 2 ,s ,T,/ii,Ci) be a weigthed 
game arena with objective Ci. As in the previous section, 
we assume that we want to minimize the regret of Player 
1 , so we omit the weight function and the target of Player 
2. 

Definition 2 (Bounded strategies). Let B G N and Xi G 

Ai(G). The strategy Ai is bounded by B if for all A 2 G 
A 2 (G), uf(Ai,A 2 ) <B. 

Note that a bounded strategy is necessarily winning, 
since by definition, the utility of some outcome is infi- 
nite iff it is loosing. The following lemma states that the 
winning strategies that minimize the regret of Player 1 are 
bounded. 

Lemma 4. For all weighted arena G — [S, s , T, jii, Ci) 
and for any strategy Xi G Ai(G) winning in G for Player 
1 that minimizes her regret, Ai is bounded by 2M \S\. 

Proof. Since we consider reachability games, it is well- 
known that if there is a winning strategy for Player 1, 
there is a memoryless strategy 71 winning for Player 1 
(see for instance 15)). In particular, for all A 2 G A 2 (G), 

G C 

Out ' 1 (7i,A 2 ) does not contain twice the same posi- 
tion. Indeed, if there is a loop, since the strategy is mem- 
oryless, Player 2 can enforce Player 1 to take this loop 
infinitely many times. Therefore for all A 2 G A 2 (G), 
uf( 7 i,A 2 ) < M G |S|. Therefore the following holds: 
(★) VA 2 G A 2 (G),brf(A 2 ) < Af G |S|. Moreover, 
regf (71) < M G \S\. Indeed, let A 2 which maximizes 
regf( 7l) A 2 ). Then regf (71) = uf ( 7l , A 2 ) - brf (A 2 ). 
Since uf (71, A 2 ) < M G \S\ and < brf (A 2 ) < M G \S\, 
we get regf (71) < M°\S\. Thus (**) regf < M G |5|. 
Finally let Ai be a winning strategy which mini- 
mizes the regret of Player 1, and A 2 G A 2 (G). 
We have regf(Ai,A 2 ) < M G |5| (by (**)), there- 
fore uf(Ai,A 2 ) - brf(A 2 ) < M G \S\, which gives 
uf(Ai,A 2 ) <2M G \S\(by (*)). □ 

Let B = 2M G \S\. Thanks to Lemma |] we can re- 
duce the RMP in a weighted arena into the RMP in a 



TWA. Indeed, it suffices to enrich every position of the 
arena with the sum of the weights occuring along the path 
used to reach this position. A position may be reachable 
by several paths, therefore it will be duplicated as many 
times as they are different path utilities. This may be un- 
bounded, but Lemma|4]ensures that it is sufficient to sum 
the weights up to B only. This may results in a larger 
graph, but its size is still pseudo-polynomial (polynomial 
in the maximal weight and the size of the graph). 

Definition 3. Let G = (S = Si W S 2 , s ,T, m, Ci) be 

a weigthed game arena. The graph of utility is the TWA 
G' = (S' = S{ W S' 2 , s' , T', /j,'i, C[) defined by: 

• S'i C S t x [B], i = 1, 2 and s' = (s , 0); 

-for all (a, «),(«',«') G S', ((«, u), (a', u')) G T' iff 
(s, s') G T and u' = u + fJ.i (s, a') ; 

• Ci = (Cix[B])n5' andV(s,u) G Ci, n[(s,u) = u. 

We now prove that regf = regf . The utility informa- 
tion added to the nodes of G is uniquely determined by 
the path used to reach the current position. Therefore the 
strategies of both players in G can naturally be mapped to 
strategies in G 1 . More formally, we define a mapping $ 
fromAi(G)UA 2 (G) into Ai(G')UA 2 (G'). Leti G {1,2} 
and X t G A;(G). Let h G P/(G') such that last(ft) G S[. 
Let s = A. i (proj 1 (/i)) and u = (iifaroj^fy.s). Then 

&(\ )(h) — / if u > B 

^ l ^ ' 1 (s,u) otherwise 

The mapping $ is surjective, but not necessarily in- 
jective. Indeed, two strategies that behave similarly up 
to an utility B are mapped to the same strategy in G'. 
Let A'; G A* (£?'). Any strategy A t G A 4 (G) that be- 
haves like A^ (on the first projections of plays) while 
the utility of the play is bounded by B is a preimage 
of X[. Formally, for all h = sqSi • • • s n G P(G), 
we let h = {sq,uq){si,Ui) . . . (s n ,u n ) where for all j, 
ui = jiti(ao ■ ■ - Sj). Then, any strategy A; G A^(G) is a 
preimage of A^ iff for all finite play h G P(G) such that 
last(/i) G Si, all s G 5, all u G N, if X'^h) is defined and 
equal to (s, it), then Xi(h) = s. 

Lemma 5. For all i = 1,2, $(At(G)) = Aj(G'). 
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We denote by (G) the set of strategies bounded by B. 
The mapping $ preserves the regret values for bounded 
strategies: 

Lemma 6. VAi G Af B (G), regf (Ai) = regf ($(Ai)). 

Proof. (Sketch) This lemma is supported by the fol- 
lowing result: for all Ai G Ai(G) and A2 G 
A 2 (G). If Out G '($(Ai),$(A 2 )) is winning for Player 
1 in G' or uf(Ai,A 2 ) < B, then uf(Ai,A 2 ) = 
uf'($(A 1 ),$(A 2 )). □ 

Note that any strategy Ai G Ai(G) is bounded by B iff 
$(Ai) is winning in G' for Player 1. We can now prove 
the correctness of the reduction: 

Lemma 7. regf = regf 

Proof. Suppose that regf = +00. If regf < +00, then 
there is a strategy A^ G Ai(G') winning in G' for Player 
1. By Lemma|5j A^ = $(Ai) for some Ai G Ai(G). 
Since $(Ai) is winning, Ai is bounded by B, and a for- 
tiori winning. Thus regf < +00, which is a contradic- 
tion. Therefore regf = regf = +00. 
Suppose that regf < +00. Thus there is a winning strat- 
egy Ai which minimizes the regret. By Lemma [4] Ai is 
bounded B. By Lemma regf(Ai) = regf (<&(Ai)). 
Thus regf = regf (Ai) = regf ($(Ai)) > regf. Con- 
versely, since ^(Ai) is winning in G', there is a win- 
ning strategy j[ G Ai(G') minimizing the regret. By 
Lemma |5] j[ = $(71) for some 71 G Ai(G). Since 
$(71) is winning, 71 is bounded by B, and by Lemma 
regf (71) = regf (<yj). So regf < regf (71) = 
regf (7Q = regf . □ 

To solve the RMP for a weighted arena G, we first con- 
struct the graph of utility G', and then apply Theorem 
[T] since G' is a TWA. Correctness is ensured by Lemma 
[7] This returns a finite-memory strategy of G' that min- 
imizes the regret, whose memory is the best alternative 
seen so far. To obtain a strategy of G minimizing the re- 
gret, one applies the inverse mapping defined pre- 
viously. This gives us a finite-memory strategy whose 
memory is the utility of the current play up to M G and 
the best alternative seen so far. 

Theorem 2. The RMP on a weighted arena G = 
(S = S± tt) S* 2 , sq, T, jitxj Ci) can be solved in time 
O {{M G f ■ log 2 {\S\ ■ M G ) ■ \S\ • d • {\S\ + |T|)). 



5 Iterated Regret Minimization 
(IRM) 

In this section, we show how to compute the iterated re- 
gret for tree arenas and for weighted arenas where weights 
are strictly positive (by reduction to a tree arena). 

Let G = (S = Si W 5 2 ,so,T,Mi,)"2,Ci,C 2 ) be a 
weighted arena. Let i G {1,2}, P t C A,(G) and P_i C 
A_i(G). The regret of Player i when she plays strategies 
of Pi and when Player —i plays strategies of P_i is defined 
by: 

regf ' Pi ' P ~ i = mm max uf (A 4 , A_,) - brf '^ (A_i) 
brf p (A_ t ) = min^n uf (A*, A.*) 

For all Ai G Pi and A_; G P-u we define 
regf ^^(Aj) and regf ^'^(A,, A_,) accordingly. We 
now define the strategies of rank j, which are the one that 
survived j times the deletion of strictly dominated strate- 
gies. The strategies of rank for Player i is Ai(G). The 
strategies of rank 1 for both players are those which min- 
imize their regret against strategy of rank 0. More gener- 
ally, the strategies of rank j for Player i are the strategies 
of rank j — 1 which minimize her regret against Player 
-z's strategies of rank j — 1. Formally, strategies of rank j 

are obtained via a delete operator D : 2 Al ( G ' x 2 A2 ( G ' ->• 
2 Ai(G) x 2 A 2 (G) such that for all Pi g Ai ( G ) and all 

P2 C A 2 (G), 

{Ai g Pi|regf p ^ = regf Pl ' P2 (Ai)} 
D(Pi,P 2 ) = x 

{A 2 G P 2 |regf P - Pl = regf < P2 < Pl (A 2 )} 

We denote by D- 7 the composition of D j times. 

Definition 4 (j-th regret). Let j > 0. The set of strategies 
of rank j for Player i is Pf = proj^ (Ai(G), A 2 (G))). 
The j + 1-th regret for Player i is defined by regf J+1 = 

reg 4 in particular, regf' = regf. 

Propositions. Let i G {1,2}. For all j > 0, Pf' +1 C P? 
and regf J+1 < regf < j . 
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Proof. (Sketch) P/ +1 C P? is by definition the operator 

D. For all A_. t G P^, br G — + * (A-*) > br G,p * (A_i) (be- 
cause we minimize over less strategies). Thus for all Aj € 
P/ and A_i G Pi,, regfrf'+^A-O < regf ' J '(A,-, A_»). 
Since Pj^ 1 C Pf j; regf J+1 (Ai) < regf J (A«) (because 
we maximize over less strategies). Therefore regf J+1 < 
regf ■ J+1 (A t ) < regf J (A,) = regf- 3 '. □ 

Clearly, the sequence of regrets converges: 

Proposition 6. There is an integer * > 1 such that for all 
j > *, for all % G {1, 2}, regf ' = regf'". 

Deflnition5 (iterated regret). For alii = 1,2, f/ie iterated 

G * 

regret of Player i is f6Q i ' . 

Example 2. As we already saw in the Centipede Game 
depicted on Fig. [2] the Player 1 's strategy minimizing her 
regret is to stop at the last step (move from A to B, from 
C to D and from E to S). Its regret value is 1. The Player 
2's strategy minimizing her regret is also to stop at the 

last step, i.e. to move from B to C and from D to E, 

c G 1 

her regret being 1. Therefore regj 1 = reg x ' = 1 and 
regf? = regf 4 = 1. If Player 1 knows that Player 2 will 
ultimately move to E, she can play the same strategy as 
before, and her regret regf' 2 is 0. Similarly regf' 2 = 0. 
Therefore regf'* = regf'* = 0. 

5.1 IRM in Tree Arenas 

In this section, we let i G {1,2} and G = (S = 
Si W S2,So,T, /ii,/i2)Ci,C2) be a finite edge-weighted 
tree arena. We can transform G into a target-weighted 
tree arena such that Ci = C 2 (denoted by C in the se- 
quel) is the set of leaves of the tree, if we allow the func- 
tions \ii to take the value +00. This transformation re- 
sults in a new target- weighted tree arena G' = (S = 
Si W S2, So, T, p[, p' 2 , C) with the same set of states and 
transitions as G and for all leaf s G C, /4(s) = uf (it), 
where tt is the root-to-leave path leading to s. The time 
complexity of this transformation is 0(151). 

We now assume that G = (S = S% W 
S2, so, T, n\,p.2, C) is a target- weighted tree arena where 
C is the set of leaves. Our goal is to define a delete op- 
erator D such that D(G) is a subtree of G such that for 
all % = 1,2, Aj(£)(G)) are the strategies of A 4 (G) that 



minimize regf. In other words, any pairs of subsets of 
strategies for both players in G can be represented by a 
subtree of G. This is possible since all the strategies in a 
tree arena are memoryless. A set of strategies Pj C Aj(G) 
is therefore represented by removing from G all the edges 
(s, s') such that there is no strategy A, G P, such that 
Ai(s) = s'. In our case, one first compute the sets of 
strategies that minimize regret. This is done as in Section 
13 by constructing the tree of best alternatives H (but in 
this case with the best alternative of both players) and by 
solving a min-max game. From H we delete all edges 
that are not compatible with a strategy that minimize the 
minmax value of some player. We obtain therefore a sub- 
tree D(H) of H such that any strategy of H is a strategy 
of D(H) for Player i iff it minimizes the minmax value 
in H for Player i. By projecting away the best alterna- 
tive information in D(H), we obtain a subtree D(G) of 
G such that any Player i's strategy of G is a strategy of 
D(G) iff it minimizes Player i's regret in G. We can it- 
erate this process to compute the iterated regret, and we 
finally obtain a subtree D*(G) such that any strategy of G 
minimizes the iterated regret for Player i iff it is a Player 
i's strategy in D*(G). 

Definition 6. The tree of best alternatives of G is the tree 
H = (S' =S[U S' 2 ,s' Q , T', p[,p' 2 ,C') defined by: 

• Si = {Mi, & 2 ) I s g Si,b K = ba G (7r s ),K = 1,2}, 

where 7r s is the path from the root Sq to s; 

• s' — (so, +00, , +00); 

• Vs,s' g S', (s,s') G V ^(projiCs), props')) e T 

• C = {s e S' I proKO) g C}; 

• \f(s,bi,b 2 )eC ,p' i (s,bi,b 2 ) = Pi(s) - mm(p,i(s),bi). 

Note that H is isomorphic to G. There is indeed a one- 
to-one mapping $ between the states of G and the states 
of H: for all s G S, $(s) is the only state s' G S' of the 
form s' = (s, 61, 62). Moreover, this mapping is naturally 
extended to strategies. Since all strategies are memory- 
less, any strategy A.; G Aj(G) is a function Si — > S. Thus, 
for all s' G SI, $(Ai)(s') = $ (X^' 1 (s'))) . Without 
loss of generality and for a technical reason, we assume 
that any strategy Aj is only defined for states s G Si that 
are compatible with this strategy, i.e. if s is not reachable 
under Ai then the value of Ai does not need to be defined. 
The lemmas of Section[3]still hold for the tree H: 
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Lemma 8. For all i G {1,2}, $(A 4 (G)) = A t (H) and 
any strategy Xi G A,(G) minimizes regf iff <&(X{) mini- 
mizes minmaxf . Moreover regf = minmaxf. 

As in Section [3] the RMP on a tree arena can be 
solved by min-max game. For all s G S 1 , we de- 
fine minmaxf (s) = minmaxf' 1 and compute this 
value by a backward induction algorithm. In particular, 
minmaxf = minmaxf (s' ) and for all s e 5': 

minmaxf (s) = < miii( Si y) er / minmaxf (s') if s g 5 4 ' 
[ max( S)S /) eT / minmaxf (s') if s e S*', 

Theorem 3. 77ie RMP on a tree arena G = 
(S, sq, T, fii, /i2, C) can fee solved in O (\S\). 

The backward algorithm not only allows us to com- 
pute minmaxf for all i G {1, 2}, but also to compute a 
subtree D(H) that represents all the Player i's strategies 
that achieve this value. We actually define the operator 
D in two steps. First, we remove the edges (s, s') € X 1 ', 
such that s 6 S' t and minmaxf (s') > minmaxf for 
all i = 1,2. We obtain a new graph H' consisting of sev- 
eral disconnected tree components. In particular, there are 
some states no longer reachable from the root s' . Then we 
keep the connected component that contains s' and obtain 
a new tree D(H). 

Player i's strategies in D(H) are not in the stricter 
sense strategies of H, as they do not specify what to play 
when Player —i leads Player i to a position that is not in 
D(H). More formally, let Xi be a strategy of Player i de- 
fined on D(H) and A_j a strategy of Player —i on H. If 
there is a position s of D(H) owned by Player —i such that 
X-i leads to s when Player i plays Xi, and if A-j(s) = s' 
for some position s' not in D(H), then A,(s') is unde- 
fined. This never happens when A^ is opposed to a strategy 
A_; of D(H), but may happen when opposed to a strategy 
X-i of ii". For this reason, we define the strategies Xi of 
D(H) for Player i as the strategies of H such that for all 
s € SI, (s, Aj(s)) is an edge of i/'. We denote again by 
Ai(D(H)) this set of strategies. With this definition, any 
strategy Xi 6 A,(-D(iT)) is defined on its outcomes in ff, 
but when opposed to any strategy A_i G A_i(D(H)), its 
outcomes are in D(H). Thus, when we iterate this opera- 
tor, we do not need to remember H' and we can consider 



only the tree D(H). The tree D(H) represents the strat- 
egy of H that minimize the regret in the following sense: 

Lemma 9. Let i e {1,2}. Let Xi € A-i(H); 
minmaxf (A*) = minmaxf iffXi e Ai(D(H)). 

Since there is a one-to-one correspondence between the 
strategies minimizing the regret in G and the strategies 
minimizing the minmax value in H, we can define D(G) 
by applying to D(H) the isomorphism in other 

words by projecting the best alternatives away, and by 
restoring the functions The set of strategies A,(D(G)) 
of D(G) is defined as ^ 1 (A i (D(H))) (in other words, 
these are the strategies of D(H) where we project the 
best alternatives away). Let A 2 ; 6 Aj(G), by Lemma [S] 
it minimizes regp iff 3>(Aj) minimizes minmax^ , and 
by Lemma |U iff $(A 4 ) G A t (D(H)), and finally, iff 
Xi G Aj(£)(G)). D(G) represents the strategies of G 
minimizing the regret in the following sense: 

Lemma 10. Leti G {1,2}. Let Aj G Aj(G); regf (Aj) = 
regf iffXi g A;(D(G)). 

We obtain a new tree D(G) whose Player i's strate- 
gies minimize the regret of Player i, for all i = 1,2. We 
can iterate the regret computation on D(G) and get the 
Player i's strategies that minimize the regret of rank 2 of 
Player i, for alii = 1,2. We continue iteration we get a 
tree G' such that D(G') = G'. We let D°(G) = G and 
Di +1 (G) = D(Di(G)). Remind that P? are Player i's 
strategies of G that minimize the j-th regret. 

Proposition 7. Let i G {1,2} and j > 0. We have 
regf J = regf J " (G) and = A 4 (^ (G)). 

Proof, (sketch) By induction on j. It is clear for j = 1 
and Lemma [TOl ensures the correctness of the induction. 

□ 

Theorem 4. Let G = (S = Si W S 2 , s , T, m, /i 2 , C) fee 
a free arena. For all i = 1,2, f/ie iterated regret of Player 
i, regf'*, can be computed in 0(|5| 2 ). 

Proof. By Propositions|6]and|7] there is an integer j such 

that regf'* = regf J(G) . According to the definition of 
D(G), j < \S\ because we remove at least one edge of 
the tree at each step. Since \D(G) \ can be constructed in 
0(151), the whole time complexity is 0(\S\ 2 ). □ 
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5.2 IRM in Positive Weighted Arenas 

A weighted arena G is said to be positive if all edges are 
weighted by strictly positive weights only. In this section, 
we let G = (S = Si W S 2 , so, T, m, \xi, Ci, C 2 ) be a 
positive weighted arena. Remind that Pf (G) is the set of 
strategies that minimize regf J , for all j > Oandz = 1,2. 

Definition 7 (j-winning and j-bounded strategies). 

Let i G {1, 2} and Xi G Aj(G). The strategy Xi is 
j-winning if for all X-i G Pf i (G), Out (Ai,A_,) is 
winning. It is j-bounded by some B > if it is j- 
winning, and for all X-i G P^(G) and all k G 
/^(Out GA (A,,A_ 4 )) <P. 

Note that j-boundedness differs from boundedness as 
we require that the utilities of both players are bounded. 
We let b G = 6(M G ) 3 \S\. We get a similar result than the 
boundedness of strategies that miminize the regret of rank 
1, but for any rank: 

Lemma 11. For all i = 1,2 and all j > 0, all j-winning 
strategies of Player i which minimize the (j + l)-th regret 
are j-bounded by b G . 

Proof. (Sketch) First, if the regrets of first rank are infinite 
for both players, then by definition of the iterated regret, 
Pi = Ai(G) and P 2 X = A 2 (G) and thus their regrets are 
infinite at any rank. Therefore there is no winning strategy 
at any rank (otherwise one of the regrets would be finite). 

Suppose that the first regret of Player i is finite for some 
i = 1, 2. By Lemma|4] the winning strategies minimizing 
her first regret are bounded by 2M G \S\. Since the weights 
are strictly positive, the lengths of the outcomes until C; 
are bounded by 2M G |5|, which allows us to bound the 
utilities of Player -i until a first visit to C, by 2 (M G ) 2 1 S \ . 
Since P/(G) C P/(G) for all j > 1, the strategies of 
Player i (which are necessarily winning as the regret is fi- 
nite) at any rank are bounded by 2(M G ) 2 \S\. This bound 
is then used (non-trivially) to bound the winning strate- 
gies of Player -i by 6(M G ) 3 \S\. The full proof is in Ap- 
pendix. □ 

Lemma QT| allows us to reduce the problem to the it- 
erated regret minimization in a weighted tree arena, by 
unfolding the graph arena G up to some maximal util- 
ity value. Lemma [TTI suggests to take b G for this max- 
imal value. However the best responses to a strategy j- 
bounded by b G are not necessarily bounded by b G , but 



they are necessarily j-bounded by b G ■ M G , since the 
weights are strictly positive. Therefore we let B G = 
b G ■ M and take B G as the maximal value. Since the 
j-winning strategies are j-bounded by b G and the best re- 
sponses are j-bounded by B G , we do not los the set of 
finite plays it of G such that pg{^) < K, for all z = 1,2. 
Note that P k{G) is finite since G has only strictly posi- 
tive weights. The unfolding of G up to B G is naturally 
defined by a tree weighted arena whose set of positions is 
P B a(G). 

Definition 8. Let G = (S = Si W 

S2, so, T, pi, /i2, Ci, C2) be a positive weighted 
arena. The B G -unfolding of G is the weighted tree 
arena G' = (S' = S{ W s' , T', p[, p' 2 , C[, C 2 ) 
such that SI = {n G Pbg(G) | last(7r) G Si} and for 
all tt,tt' g S', (it, it 1 ) g V iff (last(7r),last(7r')) G T 
and 71"' = 7r.last(7r'), and for all i = 1,2, n G C'j iff 
last(7r) G Ci and p'^TT,^') — /Xj(last(7r),last(7r')). 

We now prove that regf '* = regf '*, for all i = 1,2. 
As for edge-weighted arenas, this is done by defining a 
surjective mapping $ from Aj(G) to Aj(G'). For all 
% = 1,2 and all A, G Aj(G), and all it G P/(G) such 
that last(7r) G Si, $(Ai)(7r) =_L if there is k G {1,2} 
such that ^ K (7T.Ai(7r)) > B G , and $(Xi)(n) = n.X^ir) 
otherwise. This mapping is surjective, but not injective, 
since two strategies that behave similarly up to some util- 
ity B G are mapped to the same strategy. 

Lemma 12. For all j > 1, $(P/(G)) = P/(G') and for 
all X t g P/(G), reg: ; -;A,i = regf J ($(A 4 )). 

This allows us to prove the correctness of the reduction: 

Lemma 13. For alii = 1,2, regf'* = regf'*. 

Proof. We prove that for all j > 1, regf J = regf ' j . 
Let Ai G P-(G). By definition of P/ (G), A 4 minimizes 
the j-th regret, so that regf J '(Aj) = regf ' j . By (1), 
regf' MA,; = regf J ($(A ? )). By (2), $(A 4 ) g P/(G'), 
therefore $(Ai) minimizes the j-th regret in G', so that 

regf J ($(A,)) = regf ' j , from which we get regf'- 7 = 
regf ' j . □ 

By applying the algorithm of Section[5T|we get: 
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Theorem 5. The iterated regret for both players in a 
positive weighted arena G can be computed in pseudo- 
exponential time (exponential in \S\, \T\ and M ). 

For alH = 1,2, the procedure of Section [5TTI returns 
a finite-memory strategy A, minimizing the iterated regret 
in G' whose memory is the best alternatives seen so far by 
both players. From Xi we can compute a finite-memory 
strategy in G minimizing the iterated regret of Player i, 
the needed memory is the best alternatives seen by both 
players and the current finite play up to B G . When the 
utility is greater than B G , then any move is allowed. 
Therefore one needs to add one more bit of memory ex- 
pressing whether the utility is greater than B G . 

Finally, the unfolding of the graph arena up to B G is 
used to finitely represent the (potentially infinite) sets of 
strategies of rank j in G. Finding such a representation 
is not obvious for the full class of weighted arenas, since 
before reaching its objective, a player can take a 0-cost 
loop finitely many times without affecting her minimal 
regret. This suggests to add fairness conditions on edges 
to compute the iterated regret. This is illustrated by the 
following example. 



0/0 0/0 




Figure 4: Free loops 



Example 3. Consider the left example of Fig. Player 
1 's strategies minimizing the regret are those that pass 
finitely many times by the edge (A, B) and finally move 
to C. The regret is therefore 5. Similarly, the strategies 
minimizing Player 2 's regret are those that pass finitely 
many times by (B, A) and finally move to C. The regret 
is 5 as well. The regret of rank 2 for Player 1 is 5 as well, 
and the set of strategies minimizing it is also the same as 
before (and similary for Player 2). Indeed, the regret of 
a Player 1 's strategy that passes K times by {A, B) is 5, 



since Player 2 can maximize her regret with a strategy 
that passes at least K times by (B, A). Thus r&Q 1 '* = 

reg G >* = 5. 

On the right example, Player 1 has no winning strat- 
egy at the first rank and her regret is +oo. However the 
strategies of Player 2 minimizing her regret are the ones 
that pass finitely many times through the loop. Therefore 
all the strategies of Player 1 are winning at rank 2. The 
iterated regret of both players is 0. 

6 Conclusion 

The theory of infinite qualitive non-zero sum games over 
graphs is still in an initial development stage. We adapted 
a new solution concept from strategic games to game 
graphs, and gave algorithms to compute the regret and 
iterated regret. The strategies returned by those algo- 
rithms have a finite memory. One open question is to 
know whether this memory is necessary. In other words, 
are memoryless strategies sufficient to minimize the (iter- 
ated) regret in game graphs? Another question is to de- 
termine the lower bound on the complexity of (iterated) 
regret minimization. Iterated regret minimization over the 
full class of graphs is still open. Finally, we think that this 
work can easily be extended to an n-player setting. 
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7 Appendix 

7.1 Missing Proofs of Section |3] 
Proposition |3] 

Proof. Proof by induction on 1 7r | . 

If \ir\ = 0, then baf '(tt) = +oo = b Q . 

We now assume that the property is true for any finite 
path 7r in G' from s' to some (s, b) of length k. Let n = 
(so, b ) . . . (s k ,b k )(s k +i,b k+ i) be a path of length k + 1. 
We have: 

baf(Tr) 
= min <j<fc+i baf (s'-, s' j+1 ) 
= min(min <j< fe baf (s' p s' j+1 ), baf (s' k , s' k+1 )) 
= mm(b k ,ba 1 (s' k , s' k+1 )) by induction hypothesis 
= min(6 fc , baf (s k , s k+ i)) by (★) 
= b k+ i by definition of G" 

(*) According to Definition[T] V(s, b) e C[ : ^[(s, b) = 
/iti(s). ThusV(s,6) e S",bestf ((s,b)) = bestf(s) 
and V ((*,&),(*', 60) G T',baf ((s,b),(s',b')) - 
baf (s,s'). □ 

Proposition S] 

Proof. Constructing G' is done in three steps: 

1. compute all the values bestf (s), for all s 6 S; 
this step is equivalent to looking for the short- 
est path to the objective and has a complexity of 

0(log 2 (M?)(\S\ + \T\)). 

2. compute all the values baf (s, s'), for all (s, s') e T 
such that s 6 Si; it can be computed with a time 
complexity 0(\T\) 

3. construct G' by a fixpoint algorithm; this graph has 
at most | C 1 1 x |5| states and |Ci| x \T\ transitions. 

□ 
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Lemma |2] 

Proof. Let \i G Ai(G'). If Ai is losing, there is a 
strategy A 2 G A 2 (G') such that Out G '(Ai,A 2 ) is los- 



ing. Therefore regf (Ai) = regf" (Ai, A 2 ) = +00 = 
wi(Ai,A 2 ) = max A2eA2(G /) ^(Ai,A 2 ). 

Suppose that Ai is a winning strategy and let A 2 G 
A 2 (G') which maximizes regf (Ai,A 2 ). Let ir = 

g' c 

S0S1 . . . s n = Out ' (Ai, A 2 ). We define a strategy A 2 
that plays as A 2 on tt and cooperates with Player 1 if she 
would have deviated from tt. Formally, for all h G P(G') 
such that last(fo) G S' 2 , we let A 2 (/i) = Sj+i if there 
is j < n such that h = sosi . . . Sj. Otherwise we let 
X' 2 (h) = s such that (last(/i),s) G T" and bestf (s) is 
minimal (among the successors of last(/i)). 

Clearly, tt = Out G '' Cl (Ai, A 2 ) and brf(A 2 ) < 
brf(A 2 ). Therefore regf'(Ai,A 2 ) < regf(Ai,A 2 ). 
Since A 2 maximizes the regret, we get regf (Ai, A 2 ) = 
regf (Ai,A 2 ). 

The best response to A 2 either deviates from tt or not. If 
the best response deviates from tt at a node Sj, j < n, 
i.e. chooses a node s' such that ,s' 7^ then the 

utility of the best response, according to the definition 
of A 2 , is bestg, (s'). The best response to A 2 mini- 
mizes over all those possibilities, therefore br G (A' 2 ) = 

^^(tA( s n),T^mj<n,(s i ,8')€T',a>jt8 j+1 bestf (s')), i.e. 

min(//' 1 (s n ),baf (7r)). By Proposition [3] baf (tt) = 



G" , 



proj 2 (s„). Therefore reg^ (Ai,A 2 ) = reg^ (Ai,A 2 ) = 
/^i( s n) - min(/4(s„),p 2 (s„)) = ^i(s„). From which 
we get regf'(Ai) < max A2eA2(G /) i>i(Ai,A 2 ). 

Conversely, let A 2 which maximizes v l (\\, A 2 ). Since 

Ai is winning, we can define tt = Our ^ 1 (Ai,A 2 ). 
Similarly as forth direction of the proof, one can 
construct a strategy A' 2 that plays like A 2 along 7r 
and cooperates with Player 1 when deviating from 7r. 
Clearly, this strategy has the same outcome as A 2 
and we get regf (Ai,A' 2 ) = zyi(Ai,A 2 ). Finally we 
have regf (Ai) > regf (Ai,A 2 ) = ^i(Ai,A 2 ) = 
maxA 2 i>i(Ai, A 2 ). □ 

Lemma [T] 

Proof. The mapping $ has been defined in the paper. It 
remains to prove that reg^Ai) = regf ($i(Ai)), for all 



Ai G Ai(G). 

For all Ai G A^G), all A 2 G A 2 (G), Out G (Ai, A 2 ) = 
proj 1 (Out G '($(A 1 ),$(A 2 ))). Therefore uf(Ai,A 2 ) = 
uf ($(A 1 ),$(A 2 )). 

Finally: 

regf(Ai) 

= max u G (Ai,A 2 )- min u G (At,A 2 ) 

A 2 GA 2 (G) \\ GAi (G) 

= max uf($(A 1 ),$(A 2 ))- min uf($(At),$(A 2 )) 

A 2 SA 2 (G) AJGAi(G) 

= max U G ($(Ai), A 2 ) - min U G (A?,A 2 ) 

A 2 eA 2 (G') AJGAi(G') 

(since $(Ai(G)) = A,(G') for all i = 1, 2) 
= regf ($(Ax)) 



□ 



7.2 Missing Proofs of Section @] 

Lemma |6] The proof of this lemma is supported by the 
following lemma, which says that under certain condi- 
tions, the utility of the outcomes in G and G' are equal 
modulo 

Lemma 14. Let Ai G Ai(G) and A 2 G A 2 (G). If 
Out G '($(Ai),$(A 2 )) is winning for Player 1 in G' or 
uf (Ai, A 2 ) < B, then u G (A l5 A 2 ) = uf ($(Ai), $(A 2 )). 

Proof. If uf (A X ,A 2 ) < B, then Out G (Ai,A 2 ) is win- 

G C 

ning, and we let 7r = Out ' 1 (Ai,A 2 ). We enrich 7r 
with the utilities of Player 1 by defining a path tt' = 
(so,uo) . . . (s n ,u n ) where tt = sq . . . s n and for all 
j < n, Uj = /if (sq . . . sj). Since tt is bounded, we 
have uj < B for all j < n, and by definition of 
G', tt' is a path of G'. By definition of $ we clearly 
have tt' = Out G ($(Ai), $(A 2 )), from which we get 
uf(A 1 ,A 2 ) = uf'($(A 1 ),$(A„)). 



If Our ($(Ai),$(A 2 )) is winning for Player 1, we 
let tt' = Out G '' C ' 1 ($(A 1 ),$(A 2 )) and tt = proj^Tr'). 
Clearly, tt' is a winning play for Player 1 in G and by 



G C 

definition of $, tt' = Out ' 1 (Ai, A 2 ), from which we 



get the equality of the utilities. 
We can now prove Lemma [6j 



□ 
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Proof. (Proof of Lemma[6]) Let A2 € A2(G) which max- 
imizes regf (Ai, A2), and AJ be the best response to A2. 
Therefore regf(Ai) = uf (Ai,A 2 ) - uf(A*,A 2 ). Since 
Ai is bounded by B, we have uf(Ai,A 2 ) < B and 
uf (A^, A2) < B (since A^ is at least as good as Ai). By 
Lemma[T4l we get uf (Ai, A 2 ) = uf '($(Ai), $(A 2 )) and 
uf(A|,A 2 ) = uf'($(A|),$(A 2 )). 

By definition of the best response, brf (<1>(A2)) < 
uf'($(A 1 ),$(A 2 )). Therefore 

regf(Ai) = regf(Ai,A 2 ) 

= uf(A 1) A 2 )-uf(Af,A 2 ) 

= uf'($(Ai),$(A 2 ))-uf'($(A|),$(A 2 

< uf'($(Ai),$(A 2 ))-brf'($(A 2 )) 
= regf (*(Ai),*(A 3 )) 

< regf'($(Ai)) 

Conversely, since 4>(A 2 (G)) = A 2 (G'), there 
exists A 2 <G A 2 (G) such that $(A 2 ) maximizes 
regf'($(Ai),$(A 2 )). Similarly, there is \\ e Ai(G) 
such that <I>(A*) is the best response to $(A 2 ). Since Ai is 
bounded by B, we have uf (Ai, A 2 ) < B, and by Lemma 
[Hweget uf(Ai,A 2 ) = uf'($(Ai), $(A 2 )). Since Ai is 
winning for Player 1 and bounded by B in G, $(Ai) is 
also winning for Player 1 in G'. Therefore "J^A^) is also 
winning for Player 1 in G' (since it does at least as good as 
$(Ai) against <J>(A 2 )). Therefore Out G ' ($(Ai), $(A 2 )) 
is winning for Player 1 in G', and by Lemma [T4l we get 
uf(AJ,A 2 ) = uf'($(A?),$(A 2 )). Finally: 

regf'($(A 1 )) 
= regf'($(A 1 ),$(A 2 )) 

= uf'(*(Ai),*(A 2 ))-uf(*(A;),*(A 2 )) 
= uf(Ai,A 2 )-uf(At,A 2 ) 

< uf(Ai,A 2 )-brf(A 2 ) 
= regf(Ai,A 2 ) 

< regf(Ai) 

□ 

7.3 Missing Proofs of Section |5] 

Lemma |8] By projecting away the best alternatives of 
Player —i in H, we get a tree isomorphic to H which cor- 
responds exactly to the tree of best alternatives defined in 



Section[3] in which all the results stated in Lemma[8]have 
been already proved. Clearly, adding the best alternatives 
of the other player does not change those results. 

Lemma H] 

Proof. Let A^ e Ai(H) such that minmaxf (Ai) = 
minmaxf . Let s e £■ be a position of D(H) compat- 
ible with Ai, i.e. such that there is A_; € k-i(D{H)) 
such that s occurs in Out D ' H ' ) (Ai, A_j). Let s' = \i(s). 
We have to prove that s' is a position of D(H). We have 
minmaxf (s') < minmaxf (Ai) = minmaxf. Indeed, 
) )since Player —i is able to enforce Player i to go to s' when 
she plays Ai, if minmaxf (s') > minmaxf, then Ai does 
not minimize minmaxf. According to the definition of 
the delete operator, (s, s') is an edge of D(H). Thus s' is 
a position of D(H), and A 4 £ Ai(D(H)). 

Conversely, if A, 6 Ai(D(H)). We proceed reductio 
ad absurdum. 

If minmaxf(Ai) > minmaxf, there exists 
A_i e A_i(iJ) such that minmaxf (Ai, A_i) = 
minmaxf (A,) > minmaxf. We let n = 7r ...7r„ 
Out ; A , . A ; ; Let s, 61, & 2 such that 7r„ = (s,&i,&2)- 
Since n n e C', minmaxf(Ai) = /4(7r n ) = 
minmaxf (tt„). We consider the first position ir^ along 
7r such that k < n and irk is owned by Player i, i.e. 
7r fe € 5- and 7r fe+ i . . . 7r„_i e (S'-i)*- This position 
exists, otherwise all positions no, . . . , 7T n _i are owned 
by Player -i, and therefore minmaxf > ^-(7r„) = 
minmaxf (Ai), which contradicts our hypothesis. Since 
A.; e Ki{D{H)), by definition of Ai(D(H)), (7r fc ,7r fc+ i) 
is an edge of H'. Since from tt^+i Player —i can en- 
force Player i to go to n n (since there are only posi- 
tions owned by Player —i along Hk ■ ■ ■ T n -i)> we have 
minmaxf (7r fc+ i) > minmaxf (7r„) > minmaxf. Since 
7Tfc e Si, this contradicts the definition of H' (and D(H)), 
because the edge (irk, Kk+i) would have been removed. 
Thus minmaxf (A ( ) = minmaxf. □ 

Proposition 

Proof. Proof by induction on j. 

If j = 1, we have regf 4 = regf (G) = regf and by 
Lemma|9] P? = A t (D(G)). 



16 



We assume that regf ' j = regf 



(G) 
G,j+1 _ 



reg 



and P/ = 



thus 



Ai(D J (G)). By definition, reg 
By induction hypothesis, P- 

regf'''' reg," regf< : . 

Moreover, Aj G P/ +1 iff Aj G P/ and 
regf J+1 (A 4 ) = regf J+1 . By induction hypoth- 
esis, P/ = Ai(D J '(G)). We demonstrated that 

regf J+1 = regf J(G) . By Lemma |9] applied to 

the tree D j (G), we have regf J(G) (A,) - «*« D3(G) 
iff A, G A i (D(D^{G))). By 



£> 3 (G) 



reg 



maxA^eA-a^tG)) ( u f (Ai,A_i) 



Ai(Di(G)), we have reg^ (G) 
So regf J+1 (A 4 ) 

,£>>(G) 



reg 



G.j + l 



Since P/ = 

regf J+1 (A 4 
regf (G) (\) = reg 

Consequently, A 4 G P/ +1 iff A 3 G A ? (L>J +1 (G)) 



that Pi? contains a jo-winning strategy (if it exists). If 
jo = 0, then Pf? = A_j(G). If j > 0, then no strategy 
of P??- 1 is (jo - 1) -winning by definition of jo, so that 
reg G j0 = +oo, from which we get Pf? = A_,(G). In 
both cases, we have Pf? = A_;(G). 

Since after reaching her objective, Player i can play 
however she wants without affecting her regret, there is 
a strategy 7_j G A_j (G) that wins against all strategies of 
P/° and which is memoryless once Player i has reached 
his objective. Formally, there is a memoryless strategy 
: S-i -> S such that for all ir G P/(G) such 
that last(7r) G S-i, if tt contains a position of C-i, then 
7-iM =7-i(last(7r)). 
min x * £A ^ D3{G ) ) uf{X*,XlM.\ i e p*>. We now bound the size of 

G C 

~*(7_i,Ai), which will provide a bound on the 



reg, 
definition, 



(A,) = max A _. eA _. (£)J(G)) regf (G) (Aj, A 



iff 



□ 



Out 1 
utility. 



Let 7r_ 



Out GA (A 4 , 7 _ 



Out G ' C -( 7 _ 4 ,A l ) and tt, 
We consider two cases: 



7.4 Missing Proofs of Section O 

In this section, we prove several lemmas that do not ap- 
pear in the paper, especially to prove Lemma [T3l 

Lemma [TT1 

Proof. Suppose that there is no winning strategy for both 
players in G. Therefore regf = regf ' 1 = regf = 
regf' 1 = +oo, P/ = Ai(G) and P 2 J = A 2 (G). It is 
easy to verify that there is no j-winning strategy for both 
players and all ranks j. 

Suppose that Player i has a winning strategy, for some 
i = 1,2. Therefore by Lemma @] the strategies mini- 
mizing the regret are bounded by 2M G \S\. Since 5*j C 
= Aj(G) for all j > 0, we get that all strategy of 
Sj is bounded by 2M G |S|. Let j > 0, A, 6 S) and 

A_, G A_i(G). Let 7T = Out GA (Ai,A_i). We have 
< 2Af G |5|, and since the weights are strictly 
positive integers, \tt\ < 2Af G |S'|. Therefore H-i{ir) < 
2(A/ G ) 2 |5|. In other words, for all j > 0, all strategy of 
P- is j-winning and j-bounded by 2(M G ) 2 \S\. 

It remains to prove that the j-winning strategies of 
Player —i minimizing the (j + l)-th regret are also j- 
bounded. Let jo > be the first natural number such 



• if it -i is a prefix of 7Tj. We already know that \ 
is j-bounded by 2{M G ) 2 \S\, therefore we also get 
»M <M^i) <2(A/ G ) 2 |S|,forallre = l,2; 

• if TTi is a prefix of 7r_i, then 7r_i = 'k^, for some 
7T-. Since \ is j-bounded by 2(A/ G ) 2 |S|, ii k {-ki) < 
2(Af G ) 2 |S|, for all re = 1,2. Since is mem- 
oryless after TTi, there is no loop in 7r ? '. Therefore 
MkOI) < \S\M G , for all re = 1,2. Finally, for all 
re = 1,2, we get ^ k (tt_ 4 ) = fi K (iTi) + ^(t^) < 
(2(M G ) 2 +A/ G )|S*| <3(M G ) 2 |S*|. 

In both cases, we get j u re (7_j, A ( ) < 3(A/ G ) 2 |S'|, for all 
re = 1,2 and all A ?; G P/'. Therefore < br G (A,) < 
3(Af G ) 2 |S| (*), which holds for all \ G P/'°. We also 
getreg Gj0+1 < reg G ' JO+1 ( 7 _ 4 ) < 3(M G ) 2 \S\ (**). 



Let now A_; which minimizes reg 



GJo+l 



and Xi G 



P?°. Let 7T = Out G ' a '(A_,,A 4 ). By (**), we have 
regf^+^A^A,) < 3(A/ G ) 2 |S|, ie 



i(7r) - br G ' P -* (Xi) < 2{M G f \S\ 
G,pi° 



Since P 



Jo 



A-i(G), br";"- (A,) = br G (A,). Therefore 

by (*), we get bQ (A,) < 3(Af G ) 2 |5|, and ^(tt) < 
6(A/ G ) 2 |5|. The weights being strictly positive, we get 

MiM <6(A/ G ) 3 |5| =b G . 
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Therefore all jo-winning strategy of Player -i which 
minimizes the (jo + l)-th regret is jo-bounded by b , and 
a fortiori all j-winning strategy is also jo-bounded by b G , 
for all j> j . □ 

Lemma 15. Let i = 1,2, j > 0, X t G P/(G) and 
A_, 6 P^G). Lef o = Out G (A l; A_ 4 ). If o is win- 
ning for Player i, /ij(o) < -B and [i-i(p) < B , then 
uf(A i ,A_ i ) = uf ($(A i ),$(A_ i )). 

Proo/ If uf(Ai,A 2 ) < P, then Out G (Ai,A 2 ) is win- 

G C 

ning, and we let 7r = s . . . s„ = Out ' 1 (Ai, A2). We let 
t' = (to) ■ • ■ fan) where for all k < n, itk = sq . . . Sfe. 
Since tt is bounded, nk G P^g(G'). By definition of $ 

we clearly have tt' = Out G ($(Ai), $(A 2 )), from which 
we get uf (Ai, A 2 ) = uf'($(Ai), $(A„)). 

If Out G ($(Ai),$(A 2 )) is winning for Player 1, 
we let tt' = Out G '' C,1 ($(Ai),$(A 2 )) and tt = 
last(7To) . . . lastfa n ) where tt' = wo . . . 7C n . Clearly, tt is 
a winning play for Player 1 in G and by definition of 

G C 

tt = Out ' 1 (Ai, A 2 ), from which we get the equality of 
the utilities. 

□ 

Lemma 16. For all i = 1,2, a// j > 0. 7f$(Pf'(G)) = 
P/(G') ant/ f/?ere ;s a strategy j-bounded by b G in 

P/(G), then for all \_ t G Pf^G), brf' P * (G) (A_,) = 

br G ' P - (G,) ($(A_ 4 )). 

Proof. Let 77^ € P/ (G) be a strategy j-bounded by 6 (it 
exists by hypothesis), and let A_i <E Pf i (G). Since 77, is j- 
bounded, it is j-winning and by Lemma[l5] uf fa-s, A-,) = 
uf '($(r)i), $(A_i)). Therefore uf ($(7;,), $(A_ 4 )) < 
+00. 

Let Ai G P/(G) which minimizes br i ' ! (A_j). Let 

tt = Out GA (A 4 ,A_ 4 ). We have /Ji (tt) < uffa.A-j) < 
6 G . Since the weights are strictly positive integers, |7r| < 
& G , and therefore /i-jfa) < b G M G = B G . By Lemma 
HU we get uf (Ai, A_ 4 ) = uf (*(A*), $(A_,). Therefore 

uf (A 4 , A_ 4 ) = brf P > (G) (A_ 4 ) > brf' ^ (G ' } ($(A_ 4 )). 
Conversely, let A' ; G P/(G) which minimizes 

brf'^ (G " ) ($(A_ i )). Therefore uf (Af $(A_ 8 )) < 
uf ($(r ?i ),$(A_ i )) < +00. Since $(P/(G)) = 
P/(G) by hypothesis, there exists Ai G P/(G) such 



that $(Ai) = AJ. Since uf (A^,$(A_ l )) is finite, 
Out G (AJ, $(A_i)) is winning for Player i. It is easy to 

G C 

see that Out ' ! (Ai,A_i) = lastfao) . . . lastfa„) where 
7T . . . tt„ = Out G ' C * (A-, $(A_j)) and that they both have 
the same utility, i.e. uf(Aj,A_j) = uf ' (A-, 4>(A_,)). 

Since uf (Aj,*(A-i)) = brf ^VC*-*)). we get 
brf •^ (G ' ) ($(A_ i )) > brf P/(G) (A_ 4 ). □ 

Lemma 17. For all j > 0, i = 1,2, Ai e P/ (G). // 
$(P^(G)) = PiiiG 1 ), then \ is j-bounded by B G iff 
$(Aj) is j-winning. If\ is j-bounded by B G , then for all 
A_, G S~ l (G), uf(A l; A_ t ) = uf'($(Ai),*(A_i)). 

Proof. If A 4 G Aj(G) is j-bounded by B G . Then let 
XLi G P_ 3 i(G'). Since $(i*,(G)) = i^(G'), there ex- 
ists A_i G P?i(G) such that <E>(A_i) = A^. Since X z is 
j-bounded, we are in the condition of Lemma [15] there- 
fore uf(A 4 , A_ t ) = uf($(A i ),Al i ) < +00. Therefore 
$(Ai) wins against Al^ 

Conversely, if $(Aj) is j-winning, then let A_i G 
Pfi(G). By hypothesis, $(A_i) G P^G'). There- 
fore ^(Ai) wins against $(A_i). Let 7To . . . 7r„ = 

Out G '- C,1 ($(A 4 ),$(A_ 4 )). Clearly, by definition of $ 

G C 

and G', Ai wins against A_i and Out ' "(A^A-j) = 
last(7To) . . . lastfa„). □ 

Lemma 18. Vj >0, Vi = 1,2, !f$(P/(G)) = P/(G) 
?/zen." 

(7) reg Gj+1 = +00 iffregf' j+1 = +00 
{u) VA, g P/ +1 (G) u ^-^P/'+^G')), 
regf-'' +1 (A l ) = regf J+1 ($(A t )) 
(777) $(P/' +1 (G)) = P/ +1 (G') 

Proo/ (1) If reg Gj+1 < +00, then it means that 
there is a j-winning strategy A, G Pj(G). By 
Lemma [17] $(Aj) is j-winning. Since by hypothe- 
sis, $(Pj(G)) = Pj(G'), $(A<) G P;(G'), so that 
regf < regf ^(^(Ai)) < +00. 

Conversely, if regf J+1 < +00, there is a j-winning 
strategy A- G Pj(G'). By hypothesis, there is A t G P-(G) 
such that ^(Ai) = \[, and by LemmafTTl Ai is j-bounded, 
and in particular j-winning. Therefore regf J+1 < +00. 
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(if) The proof is in two parts, depending on whether 
regf J+1 is finite or not. 

(ii).a If regf J+1 = +00, then by (i), regf = 
+00. 

Let Xi e P/ +1 (G). Since P/ +1 (G) C P](G), \ t G 
Pj(G). By hypothesis, $(Pj(G)) = Pj(G'), therefore 

$(Ai) G P]{G'). Therefore regf ,j+1 ($(A i )) = +00 = 

regf J+1 (A,). 

If A* G $-\Pl +1 {G')), then regf J+1 ($(Ai)) = 
+00. Since P/ +1 (G') C Pj ; (G'), *(A<) G Pj(G'). By 
hypothesis, $(P?(G)) = Pj(G'), therefore A, G PJ(G), 
and regf' j+1 (A ? ) = +co = regf J+1 ($(A 4 )), since 
regf J+1 = +00. 

(ii).b If regf < +00, then by (i), regf < 
+00. Let A, G P/ +1 (G) U §- 1 (J^ +1 (G')). We prove 
that for all A_ 4 G P^(G), 

1. uf(A i) A_ i ) = uf ($(A i ),$(A_ i ) 

2. brf^ (G) (A_ l ) = brf'^ (G ' ) ($(A_ l )) 

For 1, If A, G P/ +1 (G), then since regf J+1 < +00, 
by Lemma QT| A, is j-bounded by b G , and a fortiori 
by P G . By Lemma \T7\ we get the result. If A^ G 
^(P/'+^G')), then since regf' J+1 < +00, $(A 4 ) G 
P/(G') is necessarily winning. By Lemma [171 A, is j- 
bounded by P G and again by the same lemma, we get the 
result. 

For 2, since reg^ < +00, by Lemma [TT] there is 
a strategy of Player i j-bounded by B G in P*(G). There- 
fore we can apply Lemma [T6l and we get the result. 

Finally we have: 

regf , ; ;A,i 

= max [uf(A 4 ,A_ 4 )-brf' P;(G) (A_0] 

= max [uf ($(A l ),$(A_,))-brf^ (G) ($(A_ l ))] 

A-<eP?i(G) 

(by (1) and (2)) 
= max [uf(<I>(AO,A^-brf^ (G Vi)] 

AVeP_ J s (G') 

(since $(Pj(G)) = Pj(G') by hypothesis) 
= regf' J+1 ($(A 4 )) 



(iii) Let i e {1,2} and A, G S} +l (G). Sup- 
pose that $(Ai) ^ 5] +1 (G')- It means that $(A. t ) 
does not minimize the j + 1-th regret in G'. There- 
fore there exists another strategy A' ; G Pf +1 (G') such 

that regf - i+1 (A<) < regf J+1 ($(A 4 )). By (i), we get 
regf J+1 ( 7l ) < regf J+1 (A,), for all 7 , g 
Since P/ +1 (G') C P/(G') and $(Pj(G)) = Pj(G'), 
we have <S> _1 (A£) C P?(G), and we get a contradiction 
on the minimality of A^ . 

Conversely, let A; G P/ +1 (G')- Suppose that A- ^ 
$(P/ +1 (G)). Since A', G P/(G'), by hypothesis, A^ G 
$(Pj(G)). Therefore there exists A, G Pj(G) such that 

$(Aj) = A', but Aj P/ +1 (G). It means that Aj did not 
survive to the j-th iteration. In other words, for all strat- 
egy 7l G Pf +1 (G), regf J+1 ( 7l ) < regf J+1 (A 4 ). Since 
Aj G ^(Pf+^G')), by (i») we have regf d+1 {Xi) = 
regf' J+1 ($(A,)) = regf J ' +1 (A<). By («), we also 
have regf J+1 ( 7? ) = regf ' J+1 ($( 7l )). Therefore 
regf J+1 ($( 7 i)) < regf Since 7i g Pj(G) 

and by hypothesis, $(Pj(G)) = Pj(G'), we have 
$( 7 ,:) G Pj(G'). Therefore we get a strategy $( 7l ) 
of Pj(G') with a lower (j + l)-th regret than the (j + 
1) -regret of A^. This is in contradiction with A^ G 

n +1 (G'). □ 

Lemma [131 

Proof. Clearly, <J>(A 4 (G)) = A;(G'). Therefore we can 
apply Lemma [T8l (proved in Appendix) so that items (i), 
(ii) and (iii) holds at rank 0. In particular, &(Pl(G)) = 
Pi (G'). Therefore we can again apply Lemma[T8lat rank 
1. More generally, for all j > 1, we have: 

1. for all Xi g Pj(G), regf J (A ? ) = regf' J ($(A 4 )); 

2. <Z>(Pi(G)) = Pi(G'). 

□ 
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